---
title: Production Best Practices
description: Avoid common pitfalls, optimize costs, and ensure reliable memory behavior in production.
sidebarTitle: Best Practices
---

Memory is powerful, but without careful configuration, it can lead to unexpected token consumption, behavioral issues, and high costs. This guide shows you what to watch out for and how to optimize your memory usage for production.

## Quick Reference

- **Default to automatic memory** (`enable_user_memories=True`) unless you have a specific reason for agentic control
- **Always provide user_id**, don't rely on the default "default" user
- **Use cheaper models** for memory operations when using agentic memory
- **Implement pruning** for long-running applications
- **Monitor token usage** in production to catch memory-related cost spikes
- **Test with realistic data**: 100+ memories behave very differently than 5 memories

---

## The Agentic Memory Token Trap

**The Problem:** When you use `enable_agentic_memory=True`, every memory operation triggers a **separate, nested LLM call**. This architecture can cause token usage to explode, especially as memories accumulate.

Here's what happens under the hood:

1. User sends a message → Main LLM call processes it
2. Agent decides to update memory → Calls `update_user_memory` tool
3. **Nested LLM call fires** with:
   - Detailed system prompt (~50 lines)
   - ALL existing user memories loaded into context
   - Memory management instructions and tools
4. Memory LLM makes tool calls (add, update, delete)
5. Control returns to main conversation

**Real-world impact:**

```python
# Scenario: User with 100 existing memories
agent = Agent(
    db=db,
    enable_agentic_memory=True,
    model=OpenAIChat(id="gpt-4o")
)

# 10-message conversation where agent updates memory 7 times:
# Normal conversation: 10 × 500 tokens = 5,000 tokens
# With agentic memory: (10 × 500) + (7 × 5,000) = 40,000 tokens
# Cost increase: 8x more expensive!
```

As memories accumulate, each memory operation gets more expensive. With 200 memories, a single memory update could consume 10,000+ tokens just loading context.

## Mitigation Strategy #1: Use Automatic Memory

For most use cases, automatic memory is your best bet—it's significantly more efficient:

```python
# Recommended: Single memory processing after conversation
agent = Agent(
    db=db,
    enable_user_memories=True  # Processes memories once at end
)

# Only use agentic memory when you specifically need:
# - Real-time memory updates during conversation
# - User-directed memory commands ("forget my address")
# - Complex memory reasoning within the conversation flow
```

## Mitigation Strategy #2: Use a Cheaper Model for Memory Operations

If you do need agentic memory, use a less expensive model for memory management while keeping a powerful model for conversation:

```python
from agno.memory import MemoryManager
from agno.models.openai import OpenAIChat

# Cheap model for memory operations (60x less expensive)
memory_manager = MemoryManager(
    db=db,
    model=OpenAIChat(id="gpt-4o-mini")
)

# Expensive model for main conversations
agent = Agent(
    db=db,
    model=OpenAIChat(id="gpt-4o"),
    memory_manager=memory_manager,
    enable_agentic_memory=True
)
```

This approach can reduce memory-related costs by 98% while maintaining conversation quality.

## Mitigation Strategy #3: Guide Memory Behavior with Instructions

Add explicit instructions to prevent frivolous memory updates:

```python
agent = Agent(
    db=db,
    enable_agentic_memory=True,
    instructions=[
        "Only update memories when users share significant new information.",
        "Don't create memories for casual conversation or temporary states.",
        "Batch multiple memory updates together when possible."
    ]
)
```

## Mitigation Strategy #4: Implement Memory Pruning

Prevent memory bloat by periodically cleaning up old or irrelevant memories:

```python
from datetime import datetime, timedelta

def prune_old_memories(db, user_id, days=90):
    """Remove memories older than 90 days"""
    cutoff_timestamp = int((datetime.now() - timedelta(days=days)).timestamp())
    
    memories = db.get_user_memories(user_id=user_id)
    for memory in memories:
        if memory.updated_at and memory.updated_at < cutoff_timestamp:
            db.delete_user_memory(memory_id=memory.memory_id)

# Run periodically or before high-cost operations
prune_old_memories(db, user_id="john_doe@example.com")
```

## Mitigation Strategy #5: Set Tool Call Limits

Prevent runaway memory operations by limiting tool calls per conversation:

```python
agent = Agent(
    db=db,
    enable_agentic_memory=True,
    tool_call_limit=5  # Prevents excessive memory operations
)
```

## Common Pitfalls

### The user_id Pitfall

**The Problem:** Forgetting to set `user_id` causes all memories to default to `user_id="default"`, mixing different users' memories together.

```python
# ❌ Bad: All users share the same memories
agent.print_response("I love pizza")
agent.print_response("I'm allergic to dairy")

# ✅ Good: Each user has isolated memories
agent.print_response("I love pizza", user_id="user_123")
agent.print_response("I'm allergic to dairy", user_id="user_456")
```

**Best practice:** Always pass `user_id` explicitly, especially in multi-user applications.

### The Double-Enable Pitfall

**The Problem:** Using both `enable_user_memories=True` and `enable_agentic_memory=True` doesn't give you both—agentic mode overrides automatic mode.

```python
# ❌ Doesn't work as expected - automatic memory is disabled
agent = Agent(
    db=db,
    enable_user_memories=True,
    enable_agentic_memory=True  # This disables automatic behavior
)

# ✅ Choose one approach
agent = Agent(db=db, enable_user_memories=True)  # Automatic
# OR
agent = Agent(db=db, enable_agentic_memory=True)  # Agentic
```

### Memory Growth Monitoring

Track memory counts to catch issues early:

```python
from agno.agent import Agent

agent = Agent(db=db, enable_user_memories=True)

# Check memory count for a user
memories = agent.get_user_memories(user_id="user_123")
print(f"User has {len(memories)} memories")

# Alert if memory count is unusually high
if len(memories) > 500:
    print("⚠️ Warning: User has excessive memories. Consider pruning.")
```
